FunFEM ========================= .. _funfem-label: This page references the `official documentation of FunFEM `_. Method Description ------------------ FunFEM is a model-based clustering method specifically designed for functional data, such as time series. It employs a discriminative functional mixture (DFM) model that projects the observed curves into a latent functional subspace, where clustering is performed. The key steps of the method are: - Functional Data Representation Each observed curve is first smoothed using a basis expansion (e.g., Fourier or spline basis), converting discrete observations into continuous functional forms. - Discriminative Subspace Learning A low-dimensional discriminative subspace is identified via a generalized eigenvalue problem, maximizing between-cluster variance while minimizing within-cluster variance (Fisher’s criterion). - Model Inference (FunFEM Algorithm) An iterative Expectation-Maximization (EM)-like algorithm alternates between: 1. F-step: Update the discriminative subspace orientation. 2. M-step: Estimate cluster parameters (means, covariances, and noise variances). 3. E-step: Compute posterior cluster membership probabilities for each curve. - Model Selection The optimal number of clusters (K) and intrinsic dimensionality (d) are selected using the slope heuristic, a data-driven penalty calibration method, which outperforms BIC/AIC in practice. - Sparse Basis Selection Optionally, sparsity-inducing regularization (l\ :sub:`1` penalty) is applied to select the most discriminative basis functions (e.g., key time intervals or frequencies) for interpretability. Function -------------- This method provides three core functions: **fem_sim_data**, **fem_bifunc**, and **FDPlot.fem_fdplot**. In this section, we detail their respective usage, as well as parameters, output values and usage examples for each function. fem_sim_data ~~~~~~~~~~~~~~~ **fem_sim_data** loads real-world data sourced from the French bike-sharing system. .. code-block:: python fem_sim_data() Parameter ^^^^^^^^^^ The simulated data are loaded internally and have no adjustable parameters. Value ^^^^^^^^^ The function **fem_sim_data** outputs a dict represents French bike-sharing system data. - **data**: the loading profiles (number of available bikes / number of bike docks) of the 345 stations at 181 times. - **pos**: the longitude and latitude of the 345 bike stations. - **dates**: the download dates. - **bonus**: indicates if the station is on a hill (bonus = 1). - **names**: the names of the stations. Example ^^^^^^^^ .. code-block:: python from BiFuncLib.simulation_data import fem_sim_data fem_simdata = fem_sim_data() fem_bifunc ~~~~~~~~~~~~~ **fem_bifunc** performs model fitting. .. code-block:: python fem_bifunc(fd, K = np.arange(2, 7), model = ['AkjBk'], crit = 'bic', init = 'kmeans', Tinit = (), maxit = 50, eps = 1e-6, disp = False, lambda_ = 0, graph = False) Parameter ^^^^^^^^^^ .. list-table:: :widths: 30 70 :header-rows: 1 :align: center * - Parameter - Description * - **fd** - dict, a functional data dict produced by the GENetLib package. * - **K** - integer or list, a sequence specifying the numbers of mixture components (clusters) among which the model selection criterion will choose the most appropriate number of groups. Default is 2:6. * - **model** - list, a list defining discriminative latent mixture (DLM) models to fit. There are 12 different models: "DkBk", "DkB", "DBk", "DB", "AkjBk", "AkjB", "AkBk", "AkBk", "AjBk", "AjB", "ABk", "AB". Users may supply any subset of models as a list; the optimal result will be selected according to the specified criteria. * - **crit** - character, the criterion to be used for model selection ('bic', 'aic' or 'icl'). 'bic' is the default. * - **init** - character, the initialization type ('random', 'kmeans' of 'hclust'). 'kmeans' is the default. * - **Tinit** - array, a n x K matrix which contains posterior probabilities for initializing the algorithm (each line corresponds to an individual). Default is (). * - **maxit** - character, the maximum number of iterations before the stop of the Fisher-EM algorithm. Default is 50. * - **eps** - numeric, the threshold value for the likelihood differences to stop the Fisher-EM algorithm. Default is 1e-6. * - **disp** - bool, if True, some messages are printed during the clustering. Default is False. * - **lambda_** - numeric, the (l\ :sub:`1` penalty) (between 0 and 1) for the sparse version. Default is 0. * - **graph** - bool, if True, plot the evolution of the log-likelhood. Default is False. Value ^^^^^^^^^ The function **fem_bifunc** outputs a dict including clustering results and information of the model. - **model**: the model name. - **K**: the number of groups. - **cls**: the group membership of each individual estimated by the Fisher-EM algorithm. - **P**: the posterior probabilities of each individual for each group. - **prms**: the model parameters. - **U**: the orientation of the functional subspace according to the basis functions. - **aic**: the value of the Akaike information criterion. - **bic**: the value of the Bayesian information criterion. - **icl**: the value of the integrated completed likelihood criterion. - **loglik**: the log-likelihood values computed at each iteration of the FEM algorithm. - **ll**: the log-likelihood value obtained at the last iteration of the FEM algorithm. - **nbprm**: the number of free parameters in the model. - **crit**: the model selection criterion used. - **allCriterions**: stores the criterion values for all models under every combination of **K** and **init**. If **disp=True**, the following information will be returned. .. image:: /_static/fem_res.png :width: 700 :align: center If **graph=True**, a plot of the log-likelihood versus iteration number will be returned. .. image:: /_static/fem_ll.png :width: 400 :align: center Example ^^^^^^^^ .. code-block:: python import numpy as np from BiFuncLib.fem_bifunc import fem_bifunc from BiFuncLib.simulation_data import fem_sim_data from BiFuncLib.BsplineFunc import BsplineFunc from GENetLib.fda_func import create_fourier_basis fem_simdata = fem_sim_data() # Create fd object basis = create_fourier_basis((0, 181), nbasis=25) time_grid = np.arange(1, 182).tolist() fdobj = BsplineFunc(basis).smooth_basis(time_grid, np.array(fem_simdata['data'].T))['fd'] # Biclustering res = fem_bifunc(fdobj, K=[5,6], model=['AkjBk', 'DkBk', 'DB'], crit = 'icl', init='hclust', lambda_=0.01, disp=True) # Another setting res2 = fem_bifunc(fdobj, K=[res['K']], model=['AkjBk', 'DkBk'], init='user', Tinit=res['P'], lambda_=0.01, disp=True, graph = True) FDPlot.fem_fdplot ~~~~~~~~~~~~~~~~~~ **FDPlot.fem_fdplot** produces visualizations. .. code-block:: python FDPlot(result).fem_fdplot(data, fdobj) Parameter ^^^^^^^^^^ .. list-table:: :widths: 30 70 :header-rows: 1 :align: center * - Parameter - Description * - **result** - dict, a clustering result generated by **fem_bifunc** function. * - **data** - dict, a data set loaded by **fem_sim_data** function. * - **fdobj** - dict, a fd object serving as the first input to **fem_bifunc** function. Value ^^^^^^^^^ The function **FDPlot.fem_fdplot** reconstructs the functional profiles for each cluster category, and displays a scatter plot which visualizes the distribution of data samples across different classes. For each cluster category: .. table:: :class: tight-table +----------+----------+----------+ | |fig1| | |fig2| | |fig3| | +----------+----------+----------+ | |fig4| | |fig5| | |fig6| | +----------+----------+----------+ .. |fig1| image:: /_static/fem_clus1.png :width: 300px .. |fig2| image:: /_static/fem_clus2.png :width: 300px .. |fig3| image:: /_static/fem_clus3.png :width: 300px .. |fig4| image:: /_static/fem_clus4.png :width: 300px .. |fig5| image:: /_static/fem_clus5.png :width: 300px .. |fig6| image:: /_static/fem_clus6.png :width: 300px And a scatter plot: .. image:: /_static/fem_cluster.png :width: 400 :align: center Example ^^^^^^^^ .. code-block:: python import numpy as np from BiFuncLib.fem_bifunc import fem_bifunc from BiFuncLib.simulation_data import fem_sim_data from BiFuncLib.BsplineFunc import BsplineFunc from GENetLib.fda_func import create_fourier_basis from BiFuncLib.FDPlot import FDPlot fem_simdata = fem_sim_data() # Create fd object basis = create_fourier_basis((0, 181), nbasis=25) time_grid = np.arange(1, 182).tolist() fdobj = BsplineFunc(basis).smooth_basis(time_grid, np.array(fem_simdata['data'].T))['fd'] # Biclustering res = fem_bifunc(fdobj, K=[5,6], model=['AkjBk', 'DkBk', 'DB'], crit = 'icl', init='hclust', lambda_=0.01, disp=True) # Another setting res2 = fem_bifunc(fdobj, K=[res['K']], model=['AkjBk', 'DkBk'], init='user', Tinit=res['P'], lambda_=0.01, disp=True, graph = True) # plot FDPlot(res).fem_fdplot(fem_simdata, fdobj)